feat(tools): add defineClientTool for client-resolved (HITL) tools#204
feat(tools): add defineClientTool for client-resolved (HITL) tools#204keesvandorp wants to merge 4 commits into
Conversation
|
@keesvandorp is attempting to deploy a commit to the Vercel Team on Vercel. A member of the Team first needs to authorize it. |
06c8930 to
4bc13be
Compare
|
Thanks — addressed in d839f7b. Reject mixed shapes (compiler + runtime). A client-resolved tool that also defines Regression 1 — the one that matters. New Regression 2 — separation. New eval Verified: eve typecheck + unit tests (incl. the mixed-shape rejection) + oxlint; the HITL fixture typechecks and |
|
Nice update. The important thing I would preserve before merge is that the mixed-shape rejection and the resume regression stay paired. The compiler and runtime checks close the construction path, but the fixture is what proves the runtime history is actually reconstructing one result for the original parked call. The remaining review lens I would use is:
If the gateway-backed evals cannot run in ordinary CI, I would at least keep the fixture build/typecheck plus the mixed-shape unit tests as merge blockers, and treat live eval execution as release evidence before closing the original issue. Boundary: architecture and test feedback only; no claim about using this project or running its code. |
|
Agreed on all five, and on keeping the rejection and the resume regression paired — the compiler/runtime checks guard construction, the fixture proves the reconstructed history settles to one result for the parked call. Neither substitutes for the other. The gating maps cleanly onto the existing CI:
Locally verified on the branch: If it'd help, I'm happy to split the fixture's typecheck/build into an explicit gateway-free job so the construction guarantee gates independently of eval execution — just say the word. |
|
Yes, I would split that gateway-free job. The useful boundary is:
That split keeps CI fast while preserving the invariant that no executor-less path ships without a concrete fixture compiled against the authored surface. I would make the job name explicit enough that future maintainers know what it protects, for example client-resolved-hitl-construction, and keep it pinned to the authored ask_question override plus the approval-vs-client-resolved fixture build. Boundary: architecture and test feedback only; no claim about using this project or running its code. |
Splits the construction contract from the gateway-backed evals (per review on vercel#204). New merge-blocking job proves, with no model gateway: - the client-resolved omit-execute + mixed-shape rejection unit guards, and - that the authored HITL fixture (ask_question override + approval-vs- client-resolved fixture) typechecks and builds against the authored surface. The gateway-backed `eve eval` (e2e-local / e2e-vercel) stays as runtime release evidence (single result on resume; approval-gated execution kept separate from client-resolved input). Keeps CI fast while guaranteeing no executor-less path ships without a fixture compiled against it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Kees van Dorp <keesvandorp@me.com>
|
Done — split out as a dedicated merge-blocking job in It proves construction, deterministically:
The gateway-backed Job name is intentionally explicit and pinned to that fixture pair, with a header comment stating what it protects, so it's legible to future maintainers. Verified locally: unit guards 18/18, and the fixture |
Splits the construction contract from the gateway-backed evals (per review on vercel#204). New merge-blocking job proves, with no model gateway: - the client-resolved omit-execute + mixed-shape rejection unit guards, and - that the authored HITL fixture (ask_question override + approval-vs- client-resolved fixture) typechecks and builds against the authored surface. The gateway-backed `eve eval` (e2e-local / e2e-vercel) stays as runtime release evidence (single result on resume; approval-gated execution kept separate from client-resolved input). Keeps CI fast while guaranteeing no executor-less path ships without a fixture compiled against it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Kees van Dorp <keesvandorp@me.com>
32782ea to
c1f8a63
Compare
Authored tools previously had to provide an `execute` (the compiler's
`normalizeToolDefinition` and the runtime's `resolveToolDefinition` both
called `expectFunction(execute)`). That made it impossible to author a
human-in-the-loop tool the way the built-in `ask_question` works — no
executor, the call parks for input and resolves out-of-band. Overriding
`ask_question` to widen its input schema forced an `execute`, whose
auto-result collided with the input response: two `tool_result` blocks for
one `tool_use` id, which the provider rejects on resume ("each tool_use
must have a single result").
Add `defineClientTool({ description, inputSchema, outputSchema? })`, which
stamps `clientResolved: true` and carries no `execute`:
- normalize-tool / schema-backed: allow omitting `execute` when
`clientResolved`; every other tool still requires it.
- resolve-tool: skip reattaching a live `execute` for client-resolved tools.
- The runtime already surfaces executeless tools as client-side (buildToolSet
/ wrapToolExecute return undefined), so no harness change is needed; the
resolved definition's `execute` is already Optional.
`defineTool` is unchanged and still requires `execute`. Passing `execute` to
`defineClientTool` throws.
Fixes vercel#203
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Kees van Dorp <keesvandorp@me.com>
…ressions
Address review feedback on the defineClientTool contract:
- Reject mixed shapes at BOTH the compiler (normalize-tool) and runtime
(resolve-tool): a client-resolved tool that also defines `execute` now throws
instead of silently dropping the executor. (A non-client tool that omits
`execute` was already rejected.)
- e2e HITL fixture regressions:
- client-resolved-question: an authored, widened `ask_question`
(defineClientTool + typed `ui`) parks, resumes from a structured answer, and
continues into a downstream `note` tool — exactly one tool_result for the
parked call id. Before the fix this resume 400'd ("each tool_use must have a
single result"); a green resume + downstream call proves the single result.
- approval-vs-client-resolved: proves executable-with-approval and
client-resolved input are separate paths — approval runs the executor;
client input supplies the result.
Verified: eve typecheck + unit tests (incl. the mixed-shape rejection) + oxlint;
the HITL fixture typechecks (tsc) and `eve build`s with the override.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Kees van Dorp <keesvandorp@me.com>
Splits the construction contract from the gateway-backed evals (per review on vercel#204). New merge-blocking job proves, with no model gateway: - the client-resolved omit-execute + mixed-shape rejection unit guards, and - that the authored HITL fixture (ask_question override + approval-vs- client-resolved fixture) typechecks and builds against the authored surface. The gateway-backed `eve eval` (e2e-local / e2e-vercel) stays as runtime release evidence (single result on resume; approval-gated execution kept separate from client-resolved input). Keeps CI fast while guaranteeing no executor-less path ships without a fixture compiled against it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Kees van Dorp <keesvandorp@me.com>
Public API added in this PR needs docs (per CONTRIBUTING). Add a "Custom client-resolved tools" section to the human-in-the-loop page covering defineClientTool: no execute, the ask_question override for typed pickers, the parked-input contract, and that defineTool/defineClientTool are mutually exclusive (exactly one result). Cross-link from the tools overview. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Kees van Dorp <keesvandorp@me.com>
6e8a3b2 to
6b54d8f
Compare
Fixes #203.
Problem
Authored tools are required to provide an
execute: both the compiler(
normalizeToolDefinition→expectFunction(record.execute)) and the runtime(
resolveToolDefinition→expectFunction(resolvedRecord.execute)) reject atool without one. That makes it impossible to author a tool that participates in
the human-in-the-loop input flow the way the built-in
ask_questiondoes — noexecutor, the model emits the call, the harness parks it, and the user's answer
becomes its single
tool_result.The practical consequence (from #203): overriding
ask_questionto widen its(fixed,
.strict()) input schema for typed HITL pickers forces anexecute,whose auto-result collides with the input response. The resumed turn then
carries two
tool_resultblocks for onetool_useid and the providerrejects it:
Change
Add
defineClientTool({ description, inputSchema, outputSchema? })— anauthored tool with no
execute, stampedclientResolved: true. eve neverruns it; the call parks for input and resolves out-of-band, producing exactly
one result.
internal/authored-definition/schema-backed.ts— allow omittingexecutewhen
clientResolved; every other tool still requires it.runtime/resolve-tool.ts— skip reattaching a liveexecuteforclient-resolved tools.
public/definitions/tool.ts—defineClientTool+ClientToolDefinition;passing
executethrows.public/tools/index.ts— export fromeve/tools.No harness change is needed: the runtime already surfaces executeless tools as
client-side (
buildToolSet/wrapToolExecutereturnundefined) andResolvedToolDefinition.executeis alreadyOptional. This PR just letsauthored tools reach that existing path.
defineToolis unchanged and stillrequires
execute.Authoring
agent/tools/ask_question.tswithdefineClientTooloverrides thebuilt-in question tool with a wider, typed schema while keeping native
pause/resume — the parked
input.requestedcarries the full typed input, so aclient can render a dedicated widget from it.
Tests
defineClientToolbrands the definition, marks itclientResolved, carriesno
execute, and throws whenexecuteis supplied.normalizeToolDefinitionaccepts a client-resolved tool withoutexecuteandstill rejects a non-client tool that omits it.
pnpm --filter eve typecheck/oxlint/ unit tests green.Verified end-to-end in a downstream app (Next.js +
useEveAgent): anexecuteless
ask_questionoverride parks (input.requested→session.waiting),resumes from the user's structured answer, produces a single
tool_result, andthe turn continues — no duplicate-result
400.Notes
execute, which yields a duplicatetool_resultwhen the same call parks for input #203 solution 1). Generalising theinput-extraction park to executeless tools with arbitrary names (so HITL
tools needn't be named
ask_question) is a natural follow-up, kept out ofthis PR to keep the primitive focused.